14. Goals of Data Partitioning
12 Goals Of Data Partitioning -
Why Data Partitioning?
Pipelines designed to work with partitioned data fail more gracefully. Smaller datasets, smaller time periods, and related concepts are easier to debug than big datasets, large time periods, and unrelated concepts. Partitioning makes debugging and rerunning failed tasks much simpler. It also enables easier redos of work, reducing cost and time.
Another great thing about Airflow is that if your data is partitioned appropriately, your tasks will naturally have fewer dependencies on each other. Because of this, Airflow will be able to parallelize execution of your DAGs to produce your results even faster.
Types of partitioning
SOLUTION:
- Location
- Logical
- Size
- Time
Logical partitioning
SOLUTION:
Breaking conceptually related data into discrete groups for processingTime Partitioning
SOLUTION:
Processing data based on a schedule or when it was createdSize Partitioning